software cannot compute the model, causing an error. In a logistic regression model as discussed in
Chapter 18, each time you add a covariate, you increase the overall likelihood of the model. In
Chapter 17, which focuses on ordinary least-squares regression, adding a covariate increases your
sum of squares.
What this means is that you don’t want to add covariates to your model that just increase error and
don’t help with the overall goal of model fit. A good strategy is to try to find the best collection of
covariates that together deal with as much error as possible. For example, think of it like roommates
who share apartment-cleaning duties. It’s best if they split up the apartment and each clean different
parts of it, rather than insisting on cleaning up the same rooms, which would be a waste of time. The
term parsimony refers to trying to include the fewest covariates in your regression model that explain
the most variation in the dependent variable. The modeling approaches discussed in the next section
explain ways to develop such parsimonious models.
Adjusting for confounders
When designing a regression analysis, you first have to decide: Are you doing an exploratory analysis,
or are you doing a hypothesis-driven analysis? If you are doing an exploratory analysis, you do not
have a pre-supposed hypothesis. Instead, your aim is to answer the research question, “What group of
covariates do I need to include as independent variables in my regression to predict the outcome and
get the best model fit?” In this case, you need to select a set of candidate covariates and then come up
with modeling rules to decide which groups of covariates produce the best-fitting model. In each
chapter on regression in this book, we provide methods of comparing models using model-fit statistics.
You would use those to choose your final model for your exploratory analysis. Exploratory analyses
are considered descriptive studies, and are weak study designs (see Chapter 7).
But if you collected your data based on a hypothesis, you are doing a hypothesis-driven analysis.
Epidemiologic studies require hypothesis-driven analyses, where you have already selected your
exposure and outcome, and now you have to fit a regression model predicting the outcome, but
including your exposure and confounders as covariates. You know you need to include the exposure
and the outcome in every model you run. However, you may not know how to decide on which
confounders stay in the model.
Regardless of whether you are doing exploratory or hypothesis-driven modeling, you need to
make rules before you start modeling that describe how you will make decisions about your final
model and during your modeling process. You may make a rule that all the covariates in your final
model must be associated with a p value that is statistically significant at α = 0.05. You can make
other stipulations about the final model, or the process of achieving the final model. What is
important is that you make the modeling rules and write them down before you start modeling.
You then need to choose a modeling approach, which is the approach you will use to determine which
candidate confounders stay in the model with the exposure and which ones are removed. There are
three common approaches in regression modeling (although analysts have their customized
approaches). These approaches don’t have official names, but we will use terms that are commonly
used. They are: forward stepwise, backward elimination, and stepwise selection.